VIDEOPHONE TERMINAL 

Background of the Invention 

Field of the Invention 
5 The present invention relates to a videophone 

terminal that can convey the emotions and impressions of a 
user so they can be easily understood by a co-communicant. 

Description of the Related Art 

10 In a videophone system constituted by a plurality 

of videophone terminals and a network, a picture obtained 
by a videophone terminal is transmitted, with speech, to 
the terminal of a co-communicant across a network. 
Therefore, while a user is conversing with a co-communicant 

15 at a remote location, the user can simultaneously watch the 
face of the co-communicant. And since the two can see and 
hear each other during their conversation, not only is the 
tone of voice of a co-communicant transmitted but also his 
or her appearance is presented visually, a more realistic 

20 and higher-level communication process can be performed. 

However, since some users do not want their 
pictures transmitted to their co-communicants' terminals 
and others do not think that conversing while viewing 
pictures of themselves or their co-communicants is amusing, 

25 another technique has been developed whereby the feature 
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points of individual facial features, such as eyebrows, 
eyes, the nose and the mouth, are extracted from a picture 
of a user, and based on the feature points, a virtual 
character that resembles the face of the user is generated 
5 and is transmitted as the personality of the user to the 
terminal of a co-communicant (Patent document 1: JP-A-2002- 
511617; and Patent document 2: JP-A-2002-511620) . 

According to this technique, first, a picture of a 
user's face (hereinafter referred to as a "face picture") 

10 is examined to identify the area that corresponds to the 
face of the user, and points (hereinafter referred to as 
"feature points") representing individual facial features, 
such as the eyebrows, the eyes, the nose and the mouth, are 
extracted from the face picture, as indicated in Fig. 1 for 

15 explaining a face picture and the individual feature 
points. Then, in accordance with the feature points, a 
virtual character resembling the face of the user is 
generated based on the an average face obtained by 
averaging the individual facial features. More 

20 specifically, differences between the extracted feature 
points and the feature points for an average face are 
calculated, the differential data are reflected on the 
average face, and a virtual character resembling the face 
of the user is generated. In Fig. 2, a virtual character 

25 that resembles the face of a user is presented. 

Next, the movements of the individual feature 



2 



points on the face picture of the user are tracked and are 
reflected on the virtual character. In this manner, since 
the movement of each facial feature associated with a 
change in the facial expression of the user interacts with 
5 the movement of each facial feature of the virtual 
character, the facial expression of the virtual character 
changes in consonance with a change in the user's facial 
expression. The virtual character, however, need not 
always resemble the face of the user, and when the 

10 movements of the feature points in the face picture of the 
user are reflected on a completely different virtual 
character, the facial expression of the virtual character 
may be varied as the facial expression of the user changes. 

Furthermore, when all the facial features are moved 

15 in the same direction along the coordinate axis of the face 
picture, it can be assumed that the entire face has been 
moved. Therefore, any movement by the user, such as the 
nodding, tilting or shaking of the user's head, can be 
reflected on the virtual character. 

20 As is described above, according to the 

conventional technique, since the movements of individual 
facial features associated with a change in the expression 
of the user interact with the movements of the facial 
features for the virtual character, the expression of the 

25 virtual character is changed in consonance with the 
expression of the user. Further, since any movement of the 
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user's head is reflected on the virtual character, the 
virtual character moves the same as does the user's head 
while being nodded, tilted or shaken. 

The conventional technique, however, merely 
5 provides for the direct reflection on a virtual character 
of changes in a user's facial expression and movements of 
the user's head, and emotions or impressions that are not 
accompanied by speech can not be expressed by using the 
facial expressions and movements of a virtual character. 

10 To transmit the emotions and impressions of the user so 
that they can be easily understood by a co-communicant, 
therefore, exaggerating changes in the user's facial 
expression or using representative symbols is better than 
merely having the virtual character directly reflect 

15 changes in the facial expression of the user. Furthermore, 
since the facial expressions or movements of the virtual 
character are more amusing by this method, the 
entertainment value of this method is superior. 

Summary of the Invention 

20 In order to overcome the problems, it is one 

objective of the present invention to provide a videophone 
terminal that can transmit the emotions or impressions of a 
user so that a co-communicant can easily understand them. 

To achieve this objective, according to one aspect 

25 of the present invention, a videophone terminal is provided 
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for communicating, over a network, with a different 
terminal using speech and pictures, including a virtual 
character generated based on the face of a person, wherein, 
when a predetermined operation is performed using a 
5 keyboard, or when a predetermined keyword is identified in 
the speech of a user, a picture wherein the appearance of 
the virtual character has been changed, or another, 
predetermined picture, is transmitted to the different 
terminal. Therefore, emotions or impressions can be 
10 expressed that can not be explained merely by using the 
facial expressions and the movements of the virtual 
character. As a result, the emotions or the impressions of 
a user can be transmitted to and easily understood by a co- 
communicant . 

15 According to the videophone terminal of this 

invention, a picture wherein the appearance of a virtual 
character is changed is a picture wherein the individual 
features of the face of a virtual character or the size of 
the entire face is changed, or a picture wherein a 

20 predetermined pattern is added to the virtual character. 
In this case, since the facial expressions and the 
movements of the virtual character will be more amusing, 
the entertainment value of video communication, for which 
the virtual character is used, will be increased. 

25 Further, according to the videophone terminal of 

the invention, when a keyboard is used to perform a 
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predetermined operation, a predetermined sound effect is 
transmitted to the different terminal, instead of, or while 
superimposed on the speech of the user. Therefore, the 
emotions or the impressions of the user that can not 
5 explained using only speech or pictures can be expressed 
using sounds. 

According to another aspect of the invention, a 
videophone terminal is provided for communicating, across a 
network, with a different terminal using speech and 

10 pictures, including a virtual character that is generated 
based on. the face of a person, wherein when a predetermined 
operation is performed for canceling a pending state when a 
picture other than the virtual character is transmitted, or 
when the face of a user used before the pending state is 

15 recognized in a picture obtained during a period in the 
pending state, a picture and speech are transmitted to 
recover and redisplay the virtual character on a screen, 
accompanied by a predetermined sound, and the pending state 
is canceled. Therefore, a co-communicant using the other 

20 terminal can visually ascertain that the screen has 
recovered from the pending state. 

According to an additional aspect of the invention, 
provided is a videophone terminal, for communicating, 
across a network, with a different terminal by using speech 

25 and pictures, including a virtual character that is 
generated based on the face of a person, wherein when a 
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predetermined operation is performed at the end of 
communication with the different terminal, a picture 
wherein the virtual character is disappearing from a screen 
is transmitted to the different terminal before a line is 
5 disconnected; and wherein a picture provided by a first 
predetermined operation differs from a picture provided by 
a second predetermined operation. Therefore, in accordance 
with the contents of the picture, the impression of a 
conversation engaged in by a user can be transmitted to the 
10 co-communicant. 

Brief Description of the Drawings 

Fig. 1 is a diagram for explaining a face picture 
and feature points; 

Fig. 2 is a diagram for explaining a virtual 
15 character resembling the face of a user; and 

Fig. 3 is a block diagram showing the configuration 
of a videophone terminal according to one mode of the 
present invention . 

In the drawings, a reference numeral 101 refers to 
a camera; 103 to a video processor; 105 to a microphone; 
107 to a loudspeaker; 109 to an audio processor; 111 to a 
virtual character generator; 113 to a display unit; 115 to 
a keyboard; 117 to a storage unit; 119 to a central 
processing unit; 121 to a wireless unit; and 123 to an 
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antenna . 

Detailed Description of the Preferred Embodiments 

A videophone according to the present invention 
will now be described while referring to drawings. 
5 A videophone terminal according to one mode of this 

invention is a portable telephone, or a communication 
terminal such as a PHS or a PDA, that includes a camera for 
obtaining moving pictures and static pictures, (both of 
which are hereinafter referred to simply as "pictures"), 

10 The videophone terminal can be used as a video telephone 
for exchanging pictures and speech, over a network, with 
another videophone terminal. A picture exchanged between 
videophone terminals during a videophone conversation may 
be not only be a picture obtained by a camera, but also a 

15 picture of a virtual character that is generated based on a 
picture of a user taken by the camera. In this mode, an 
example is employed wherein a picture of the virtual 
character is received. 

The processing for generating a virtual character 

20 will now be described. The videophone terminal for this 
mode identifies the area of the face in a picture of a user 
taken by the camera. Then, points (hereinafter referred to 
as feature points) representing facial features, such as 
the eyebrows, the eyes, the nose and the mouth, are 

25 extracted from the face picture. Fig. 1 is a diagram for 
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explaining the locations on a picture of the feature points 
of the individual facial features. Since the eyebrows, the 
eyes, the nose and the mouth, which are the main facial 
features, are complicatedly changed, depending on the 
5 facial expression, facial features that are moved relative 
to other features as an expression is changed are extracted 
as feature points. 

Next, a virtual character, resembling the face of 
the user, . is generated based on an average face formed by 

10 averaging the feature points corresponding to the 
individual facial features of the user. More specifically, 
differences between the extracted feature points and the 
feature points for the average face are calculated, and the 
obtained differential data are reflected on the character 

15 of the average face. In this manner, a virtual character 
resembling the face of the user is generated. Fig. 2 is a 
diagram for explaining a virtual character resembling the 
face of the user. 

The feature points on the picture of the face of . 

20 the user are tracked, and the movement of each feature in 
the picture is reflected on the virtual character. When 
all the features of the face are moved in the same 
direction, along the coordinate axis of the face picture, 
it can be assumed that the whole face is moving . 

25 Therefore, the nodding, or tilting or shaking of the head 
of the user can be reflected on the virtual character. 
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The configuration of the videophone terminal for 
this mode will now be described while referring to Fig. 3. 
As is shown in Fig, 3, the videophone terminal for this 
mode comprises: a camera 101, a video processor 103, a 
5 microphone 105, a loudspeaker 107, an audio processor 109, 
a virtual character generator 111, a display unit 113, a 
keyboard 115, a storage unit 117, a central processing unit 
119, a wireless unit 121 and an antenna 123. 

The video processor 103 analyzes a picture taken by 
10 the camera 101, identifies the location of a face in the 
picture, and extracts feature points. The audio processor 
109 performs a predetermined process for the speech of a 
user input through the microphone 105, or processes speech 
data for a co-communicant received from the co- 
is communicant's videophone terminal, and outputs the speech 
through the loudspeaker 107. The processing performed by 
the audio processor 109 includes the analyzation of 
elements that are speech characteristics, such as the 
volume, the tone and the pitch, and this analyzation is 
20 performed both for the speech of the user and of the co- 
communicant . 

The virtual character generator 111 generates a 
virtual character based on the feature points extracted by 
the video processor 103, and reflects, on the virtual 
25 character, the facial expressions and the movements of the 
user obtained by the camera 101. The virtual character 
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generator 111 may change part or all of the virtual 
character in accordance with an instruction received from 
the central processing unit 119. Based on schedule 
information and date information stored in the storage unit 
5 117, the virtual character generator 111 designates a 
predetermined picture as the background for the virtual 
character to be displayed on the display unit 113. The 
background changes depending on the day or the current 
circumstances; for example, the picture of . a cake may be 

10 designated as the background for the birthday of a user, 
the picture of a tiered platform, carpeted in red, for 
dolls may be designated for March 3rd (the Girls' Doll 
Festival), or the picture of a carp streamer may be 
designated for May 5th (the Boys' Festival). 

15 The storage unit 117 is used to store a program 

related to a change in the expression of the virtual 
character and in the movement of the virtual character, 
predetermined picture and speech data, and scheduling 
information and date information for a user. 

20 The keyboard 115 is used to transmit, to the 

central processing unit 119, an instruction to shift to a 
pending mode that will be described later, an instruction 
for line disconnection, and other instructions. The 
central processing unit 119 performs video and audio 

25 processing in accordance with an instruction entered using 
the keyboard 115 or based on a keyword, predetermined 
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processing upon the connection/disconnection of the line, 
and at the start/cancellation of the pending mode/ and 
compression/decompression processing for video data and 
audio data. The wireless unit 121 modulates or demodulates 
5 the video and audio data, and exchanges signals via the 
antenna 123. 

While taking the foregoing explanation into 
consideration, a detailed explanation will be given for the 
videophone terminal of the invention by . describing, in 
10 order, a first embodiment, a second embodiment and a third 
embodiment . 

[First Embodiment] 

According to the first embodiment, during a video 
15 conversation using a virtual character, when a user employs 
the keyboard 115 in Fig. 3 to perform a predetermined 
operation, or when the audio processor 109 identifies a 
predetermined keyword in the speech of the user, a picture 
wherein the appearance of the virtual character is changed, 
20 or a completely different picture is provided. 

To change the appearance of the virtual character, 
the size of each feature on the face of the virtual 
character, or the size of the entire face, may be changed, 
or a pattern that expresses an emotion may be added, e.g., 
25 vertical lines may be added to the eyes of the virtual 
character, or the color of the cheeks may be changed to 
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red. Surprise can be expressed by making only the eyes of 
the virtual character larger than usual, and anger can be 
expressed by making the entire face larger than usual and 
turning the face red. 
5 Available pictures that are completely different 

are those of an exclamation mark ( ! ) and a question mark 
(?) . The exclamation mark can be used to express 
admiration, and the question mark (?) can be used to 
express doubt. 

10 As another available picture, a picture showing a 

thumbs-up gesture may be stored in advance in the storage 
unit 117 in correlation the keyword, "All-right!". Then, 
when the audio processor 109 identifies this keyword in the 
speech of the user, the central processing unit 110 can 

15 read, from the storage unit 117, the picture showing the 
thumbs-up gesture and display this picture instead of, or 
while superimposed on, the picture of the virtual 
character. Not only a static but also an animated picture 
may be used. 

20 Likewise, predetermined sound effects can be stored 

in advance in the storage unit 117 in correlation with 
predetermined keyboard operations. When a predetermined 
operation is performed using the keyboard 115, the central 
processing unit 119 may read from the storage unit 117 data 

25 for a corresponding sound effect, and reproduce the sound 
instead of, or while overlapping, the speech of the user or 
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the user's co-coinitiunicant . 

As is described above, according to this 
embodiment, when the user performs a predetermined 
operation using the keyboard 115, or when the audio 
5 processor 109 identifies a predetermined keyword, a virtual 
character showing an expression or movement that differs 
from the usual, or a completely different picture is 
displayed. Therefore, an emotion or an impression can be 
conveyed that can not be conveyed by using only the facial 

10 expression and movement of the virtual character. In this 
case, although the facial expression and the movement of 
the virtual character are themselves very amusing, the 
entertainment provided by video communication that uses the 
virtual character can be especially enhanced. In addition, 

15 since in accordance with the keyboard operation 
predetermined sound effects can be reproduced, an emotion 
or an impression of the user can be conveyed that can not 
be conveyed by using only speech and the picture. 

20 [Second Embodiment] 

According to the second embodiment, during a video 
conversation for which the virtual character is used, the 
operating mode may be shifted to a pending mode, and when 
the pending mode is canceled, the virtual character is 
25 recovered and redisplayed on a screen, accompanied by the 
playing of a melody. This occurs when the user, while the 
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operating mode is the pending mode, depresses the pending 
mode button on the keyboard 115 in Fig. 3, and the central 
processing unit 119 detects this depression and cancels the 
pending mode. Or when feature points, which are extracted 
5 by the video processor 103 from a picture obtained by the 
camera 101 in the pending mode, correspond to those of the 
user who was engaged in a conversation before the pending 
mode was entered, and the central processing unit 109 
cancels the pending, mode . 

10 At this time, the central processing unit 119 

executes a predetermined program, reads predetermined 
melody data from the storage unit 117, and while playing a 
melody, displays the virtual character that was displayed 
before the pending mode was entered. It should be noted, 

15 however, that a specific period of time is required, 
following the cancellation of the pending mode, before the 
virtual character can actually be displayed because, based 
on the feature points that are extracted by the video 
processor 103 from the picture obtained by the camera 101, 

20 the virtual character generator 111 must reflect the 
expression and the movement of the user on the virtual 
character. Therefore, the picture displayed on the screen 
during this waiting period is the picture of the virtual 
character with the same expression that shows a 

25 predetermined movement together, accompanied by a melody. 
An example predetermined movement is one wherein the 
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virtual character opens a door and enters a room. 

As is described above, according to this 
embodiment, when the pending mode is canceled, a picture is 
displayed wherein the virtual character performs a 
5 predetermined movement accompanied by a predetermined 
melody, the user's co-communicant can visually apprehend 
that the operating mode has recovered from the pending 
state. 

10 [Third Ernbodlment] 

According to the third embodiment, at the end of a 
video conversation using the virtual character, a 
predetermined picture wherein the virtual character 
disappears from the screen is displayed in accordance with 

15 a button selected by the user on the keyboard 115 in Fig. 
3, and the line is disconnected. An .example predetermined 
picture is a picture wherein the virtual character 
disappears from the screen while holding flowers, or a 
picture wherein the virtual character is crushed by 

20 pressure applied to its head. 

It should be noted that the impression provided the 
user's co-communicant differs depending on the contents of 
the picture. Therefore, if the user has enjoyed the 
conversation, at the end of the conversation, the user may 

25 depress a predetermined button that presents a picture 
wherein the virtual character disappears from the screen 
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while holding flowers. On the other hand, if the user has 
no special reaction to the conversation, at the end of the 
conversation, the user may depress another button that 
presents a picture wherein the virtual character is crushed 
5 by pressure applied to its head. When at the end of the 
conversation a predetermined button is depressed in this 
manner, the central processing unit 119, before the line is 
disconnected, reads from the storage unit 117 and transmits 
data for a picture corresponding to the depressed button. 

10 As is described above, according to this 

embodiment, at the end of a conversation a picture is 
displayed in consonance with the depression of a button, 
and using the contents of the picture, the impression the 
user received from the conversation is transmitted to the 

15 co-communicant. 

For each of the embodiments, the video processor 
103, the audio processor 109, the virtual character 
generator 111 and the central processing unit 119 of the 
videophone terminal may be operated by the execution of 

20 programs . 

As is described above, according to the videophone 
terminal of the invention, the emotions and impressions of 
a user can be transmitted to and easily understood by a co- 
communicant . 

25 
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